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Abstract 

Inference of topological and geometric attributes of a hidden manifold from its point data is a funda- 
mental problem arising in many scientific studies and engineering applications. In this paper we present 
an algorithm to compute a set of loops from a point data that presumably sample a smooth manifold 
M C R d . These loops approximate a shortest basis of the one dimensional homology group Hi(M) 
over coefficients in finite field Z2. Previous results addressed the issue of computing the rank of the ho- 
mology groups from point data, but there is no result on approximating the shortest basis of a manifold 
from its point sample. In arriving our result, we also present a polynomial time algorithm for computing 
a shortest basis of Hi(/C) for any finite simplicial complex JC whose edges have non-negative weights. 
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1 Introduction 



Inference of unknown structures from point data is a fundamental problem in many areas of science and 
engineering that has motivated wide spread research [T][T3l|2T]|23j|24l|25l. Typically, this data is assumed to 
be sampled from a manifold sitting in a high dimensional space whose geometric and topological properties 
are to be derived from the data. In this work, we are particularly interested in computing a set of loops 
from data which not only captures the topology but is also aware of the geometry of the sampled manifold. 
Specifically, we aim to approximate a shortest basis for the one dimensional homology group from the data. 

Recently, a few algorithms for computing homology groups from point data have been developed. One 
approach would be to reconstruct the sampled space from its point data (6JI71Q21 and then apply known tech- 
niques for homology computations on triangulations ll20l . However, this option is not very attractive since a 
full-blown reconstruction with known techniques requires costly computations with Delaunay triangulations 
in high dimensions. Chazal and Oudot [8J showed how one can use less constrained data structures such 
as Rips, Cech, and witness complexes to infer the rank of the homology groups by leveraging persistence 
algorithms |[T8l l25l . Among these, the Rips complexes are the easiest to compute though they consume 
more space than the others, an issue which has started to be addressed ifToll . 

All of the above works so far considered only focus on computing the Betti numbers, the rank of the 
homology groups. Although the persistence algorithms |[T8l l25l also provide representative cycles of a 
homology basis, they remain oblivious to the geometry of the manifold. As a result, these cycles do not have 
nice geometric properties. A natural question to pose is that if the loops of the one dimensional homology 
group are associated with a length under some metric, can one approximate/compute a shortest set of loops 
that generate the homology group in polynomial time? This question has been answered in affirmative for 
the special case of surfaces when they are represented with triangulations lfl9l . In fact, considerable progress 
has been made for this special case on various versions of the problem. We cannot apply these techniques, 
mainly because we deal with point data instead of an input triangulation. Also, these works either consider 
a surface (3l |4] [15] [GO instead of a manifold of arbitrary dimension in an Euclidean space, or use a local 
measure other than the lengths of the generators in a basis |9). 

Our main result is an algorithm that can compute a set of loops from a Rips complex of the given data and 
a proof that the lengths of the computed loops approximate those of a shortest basis of the one dimensional 
homology group of the sampled manifold. In arriving at this result, we also show how to compute a shortest 
basis for the one dimensional homology group of any finite simplicial complex whose edges have non- 
negative weights. Given that computing a shortest basis for /c-dimensional homology groups of a simplicial 
complex over Z2 coefficients is NP-hard for k > 2 (Chen and Freedman ifTTI ). this result settles the open 
case for k = 1. 

1.1 Background and notations 

We use the concepts of homology groups, Cech and Rips complexes from algebraic topology and geodesies 
from differential geometry. We briefly discuss them and introduce relevant notations here; the readers can 
obtain the details from any standard book on the topics such as |[T7ll20l . 

Homology groups and generators: A homology group of a topological space T encodes its topological 
connectivity. We use Hfc(T) to denote its /c-dimensional homology group over the coefficients in Z2. Since 
Z2 is a field, Hfc(T) is a vector space of dimension k and hence admits a basis of size k. We are con- 
cerned with the 1-dimensional homology groups Hi(T). The elements of Hi(T) are equivalent classes [g] 
of 1-dimensional cycles g, also called loops. A set {[gi], ■ ■ ■ , [gk]} generating Hi(T) is called its basis 
where k = rank (Hi (T)). Simplifying the notation, we say {gi, ... ,g a } generate Hi(T) if {[gi], ... , [g a ]} 
generate Hi(T) and is a basis if a = rank(H\(T)). We assume that each loop g in T is associated with a 
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non-negative weight w(g). The length of a set of loops G = {g±, . . . , g a } is given by Len(G) = T,f =1 w(gi). 
A shortest set of generators or a shortest basis of Hi(T) is a basis G of Hi(T) where Len(G) is minimal 
over all bases. When T is a simplicial complex, all loops are restricted to its 1-skeleton. 

Complexes: Let B(p,r) denote an open Euclidean d-ball centered at p with radius r. For a point set 
P C M. d , and a real r > 0, the Cech complex C r (P) is a simplicial complex where a simplex a G C r (P) 
if and only if Vert (a), the vertices of a, are in P and are the centers of d-balls of radius r/2 which have a 
non-empty common intersection, that is, n pg vert(cr)-6(p> r/2) ^ 0. Instead of common intersection, if we 
only require pairwise intersection among the d-balls, we get the Rips complex lZ r {P). It is well known that 
the two complexes are related by a nesting property: 

Proposition 1.1 For any finite set P C R d and any r > 0, one has C r (P) C TZ r {P) C C 2r (P). 

Geodesies: The vertex set P of the simplicial complexes we consider is a dense sample of a smooth compact 
manifold M C M. d without boundary. Assume that M is isometrically embedded, that is, M inherits the 
metric from M d . For two points p,q G M, a geodesic is a curve connecting p and g in M whose acceleration 
has no component in the tangent spaces of M. Two points may have more than one geodesic among which 
the ones with the minimum length are called minimizing geodesies. Since M is compact, any two points 
admit a minimizing geodesic. If p and q are close enough, this minimizing geodesic is unique, which we 
denote as j(p, q). The lengths of minimizing geodesies induce a distance metric du '■ M x M — ► R where 
d]\>i(p, q) is the length of a minimizing geodesic between p and q. Clearly, d(p, q) < du{p, q) where d(p, q) 
is the Euclidean distance. If d(p, q) is small, Proposition ! 1.2l asserts that there is an upper bound on dmip, q) 
in terms of d(p, q). Our proof extends a result in Q where Belkin et al. show the same result on a surface in 
IR 3 . See the Appendix for the proof. The reach p(M) is defined as the minimum distance between M and 
its medial axis. 

Proposition 1.2 Ifd(p, q) < p(M)/2, one has d M (p, q) < (1 + ^p0f)d(p, q). 

Convexity radius and sampling: For a point p G M, the set of all points q with d]\,f(p, q) < r form p's 
geodesic ball Bm{p, r) of radius r. It is known that there is a positive real r p for each point p G M so that 
B>m(p, r p ) is convex in the sense that the minimizing geodesies between any two points in B>m(p, r p ) lie in 
Bm{p,t p ). The convexity radius of M is p c {M) = inf pg A/r p . We use Euclidean distances to define the 
sampling density. We say a discrete set P C M is an e-sampleQ of M if B(x, e) n P ^ for each point 
x G M. 

1.2 Main results 

We present an algorithm that computes a set of loops G = {g±, . . . , g^} from an e-sample P of M and a 
parameter r > whose total length is within a factor of the total length of a shortest basis in Hi(M). The 
factor depends on e, r, and p{M). 

Theorem 1.3 Let M C M. d be a smooth, closed manifold with I as the length of a shortest basis ofW\{M). 
Given a set P C M of n points which is an e-sample of M and 4e < r < min{^ y| p(M), p c (M)}, one 
can compute a set of loops G in 0(n(n + n e ) 2 (n e + nt)) time where 

< Len(G) < (1 + —)£, and 

n e and rit are the numbers of edges and triangles respectively in the Rips complex 1Z 2r (P). 



Here e-sample is not defined relative to reach or feature size as commonly done in reconstruction literature fTir7lfT2 
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The above result suggests that lirru r ^ Len(G) — > £. To make p and r simultaneously approaching 
0, one may take r = 0(y/e) and let e — ► 0. We note that n e = 0(n 2 ) and n t = 0(n 3 ) giving an 0(n s ) 
worst-case complexity for the algorithm. However, if r = 0(e) and points in P have 0(e) pairwise distance, 
n e and n t reduce to 0(n) by a result of (8). hi this case we get a time complexity of 0(n 4 ). In arriving at 
Theorem 1 1.31 we also prove the following result which is of independent interest. 

Theorem 1.4 Let K, be a finite simplicial complex with non-negative weights on edges. A shortest basis for 
Hi(/C) can be computed in 0(n 4 ) time where n is the size ofKL. 

2 Algorithm description 

The algorithm that we propose proceeds as follows. We compute a Rips complex lZ 2r (P) out of the given 
point cloud P C M. Next, we compute the rank k of Hi (M) by considering the persistent homology group 

H r { 2r {K{P)) = images 

where the inclusion i : TZ r (P) ^ TZ 2r (P) induces the homomorphism t, : Hi(7£ r (P)) -» Hi(7e 2r (P)). It 
is known that the rank of H^' 2r (1Z(P)) coincides with that of Hi(M) for appropriate r. 

We show that a shortest basis of H^' r (1Z(P)) approximates a shortest basis of Hi(M). Therefore, we 
aim to compute a shortest basis of H^' 2r (1Z(P)) from 1Z r (P) and lZ 2r {P). To accomplish this, the algorithm 
augments 1Z 2r {P) by putting a weight w{e) on each edge e G 1Z 2r (P). The weights are of two types: either 
they are the lengths of the edges, or a very large value W which is larger than k times the total weight of 
1Z r (P). Precisely we set 

n _ f length of e if e G TZ r (P) 
W[6) ~\W if e G K 2r {P) \ K r {P). 

Let the complex 1Z 2r (P) augmented with weig hts be denoted as K 2r+ (P). A shortest basis of Hi(^ 2r+ (P)) 
does not necessarily form a shortest basis of H^' 2r (1Z(P)). However, the first k loops sorted according to 
lengths in a shortest basis of Hi(^ 2r+ (P)) form a shortest basis of H r { 2r (K(P)). We give an algorithm to 
compute a shortest basis for any simplicial complex which we apply to 1Z 2r+ (P). 

Since we are interested in computing the generators of the first homology group, it is sufficient to con- 
sider all simplices up to dimension two, that is, only vertices, edges, and triangles in the simplicial com- 
plexes that we deal with. Henceforth, we assume that all complexes that we consider have simplices up to 
dimension two. 

2.1 Computing loops 

We will prove later that a shortest basis for H^' 2t \1Z{P)) indeed approximates a shortest basis for Hi(M). 
The algorithm ShortLoop computes them. 

Algorithm 1 ShortLoop (P, r) 
1: Compute the Rips complex 1Z 2r {P) and a weighted complex 1Z 2r+ {P) from it as described. 
2: Compute the rank k of Y\ r { 2r (1Z{P)) by the persistence algorithm. 
3: Compute a shortest basis for Hi(7£ 2r+ (P)). 
4: Return the first k smallest loops from this shortest basis. 
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Theorem 2.1 The algorithm ShortLoop(P, r) computes a shortest basis for the persistent homology 
group Hf r (K(P)). 

Proof: Let g\,...,g a be the set of generators sorted according to the non-decreasing lengths which are 
computed in step 3. They generate H\(Tl 2r+ (P)). Out of these generators the algorithm outputs the first 
k generators g\, . . . , g^. Since k is the rank of Hj' 2r (P) there are k independent generators in Hi(7£ r (P)) 
which remain independent in Hi(lZ 2r+ (P)). We claim that the loops g±, . . . ,g^ reside in lZ r (P). For if 
they do not, the sum of their lengths would be more than W which is k times larger than the total weight 
of lZ r (P). Then, we can argue that any k independent set of loops from lZ r (P) which remain independent 
in Hi(7^ 2r+ (P)) can replace gi,...,gk to have a smaller length so that gx,...,g a could not be a shortest 
basis of Hi(^ 2r+ (P)). 

The above argument implies that gi, . . . , g^ is a basis of H^' 2r (P). If it is not a shortest basis, it can be 
replaced by a shorter one so that again we would have a basis of Hi (lZ 2r+ (P)) which is shorter than the one 
computed. This is a contradiction. ■ 

It remains to show how to compute a shortest basis of H 1 (7^ 2r+ (P)) in step 3 of ShortLoop. 
2.2 Shortest basis 

Let K be any finite simplicial complex embedded in IR rf whose edges have non-negative weights. To compute 
a shortest basis for W\{1C) we make use of the fact that H^/C) is a vector space as we restrict ourselves to 
7L<i coefficients. For such cases, Erickson and Whittlesey |[T9l observed that if a set of loops C in /C contains 
a shortest basis, then the greedy set G chosen from £ is a shortest basis. The greedy set G of C is an ordered 
set of loops {<7i, . . . g^}, k = rank Hi(/C), satisfying the following condition. The first element gi is the 
shortest loop in £ which is nontrivial in Hi(/C). Suppose gi, . . . ,gi have already been defined in the set G. 
The next chosen loop g, l+ i is the shortest loop in C which is independent of g\, . . . , gi, that is, [gi+\] cannot 
be written as a linear combination of [gi], [gi]. The check for independence is a costly step in this greedy 
algorithm which we aim to reduce. We construct a set of canonical loops which contains a basis of Hi(/C). 
This set is pruned by a persistence based algorithm before applying the greedy algorithm. 

2.2.1 Canonical loops 

We start with citing a result of Erickson and Whittlesey |fT9l . A simple cycle L is tight if it contains a 
shortest path between every pair of points in L. 

Proposition 2.2 With non-negative weights, every loop in a shortest basis o/'Hi(/C) is tight. 

To collect all tight loops, we consider the canonical loops defined as follows. Let T be a shortest path tree 
in K, rooted at p. Notice that we are not assuming T to be unique but it is fixed once computed. For any two 
nodes q\,q2 £ P, let rir(gi, q2 ) denote the unique path from q\ to q2 in T. Let be the set of edges in 
T. Given a non-tree edge e = (q±, q^) G E 1 \ Et, define the canonical loop of e with respect to p, c p (e) in 
short, as the loop formed by concatenating Ht{p, qi), e, and Hr((/2, p), that is, 

c p (e) = IT T (p, qi) o eoU T (q 2 ,p). 

Let C p be the set of all canonical loops with respect to p, i.e., C p = {c p (e) : e G E \ Et}. Then we have 
the following easy consequence. 

Proposition 2.3 U pg pC p contains all tight loops. 



4 



Therefore U pe pC p is a set of loops from which the greedy set can be selected. However, U pG pC p can be 
a very large set containing possibly many trivial loops which result into many unnecessary independence 
checks. To remedy this, we identify the greedy set G p of C p and choose the greedy set from the union 
Up^pGp instead of U p& pC p . It turns out that G p can be computed by a persistence based algorithm thereby 
avoiding explicit independence checks. 

If the lengths of the loops in C p are distinct, the greedy set G p is unique. However, in presence of equal 
length loops we need a mechanism to break ties. For this we introduce the notion of canonical order. We 
assign a unique number z/(e) between 1 to m to each non-tree edge e if there are m of them. For any two non- 
tree edges e and e', let e < e' if and only if either Len(c p (e)) < Len(c p (e')), or Len(c p (e)) = Len(c p (e')) 
and v(e) < v(e'). The total order imposed by '<' provides the canonical order 

ei < e 2 < . . . < e m . 

Based on this canonical order, we form the greedy set G p of C p as described in the beginning of Section I2T21 
Below we argue that U pe pG p is good for our purpose and each set G p can be computed based on the 
persistence algorithm. 

Proposition 2.4 The greedy set chosen from U pe pG p is a shortest basis o/'Hi(/C). 

Proof: We show that U pe pG p contains a shortest basis of Hi (AC). Then, the proposition follows by the 
argument as delineated at the beginning of section |2T2] 

Consider all canonical loops U pe pC p . Sort them in non-decreasing order of their lengths. If two loops 
have equal lengths and if there are points pi G P for which both of them are in C Pi , break the tie using 
the canonical order applied to the canonical loops for any such one point. Based on this order let G be the 
greedy set from U pG pC p . Proposition 12.21 and Proposition [23] imply that U pe pC p contains a shortest basis 
of Hi (AC) and thus G is a shortest basis. Consider any loop L in G. It is a canonical loop with respect to 
some q E P for which all loops appearing before L in the canonical order precede it in the sorted sequence. 
The loop L is independent of the loops in U pe pC p appearing before L, in particular independent of the 
loops in C q appearing before L in the canonical order, which means L £ G q . Therefore U pe pG p contains a 
shortest basis G of Hi (AC). The proposition follows. ■ 

Motivated by the above observations, we formulate an algorithm CanonGen that computes the greedy 
set G p of C p . We note that, very recently, Chen and Freedman [9] proposed a similar algorithm which 
computes an approximation of a shortest basis of a simplicial complex rather than an optimal one. 

Algorithm 2 CanonGen (p, AC) 

1: Construct a shortest path tree T in AC with p as the root. Let denote the set of tree edges. 

2: For each non-tree edge e = (q±, 52) G E \ Et, let c p (e) be the canonical loop of e. 

3: Perform the persistence algorithm based on the following filtration of AC: all the vertices in P = 
Vert (AC), followed by all tree edges in T, followed by non-tree edges in the canonical order, and 
followed by all the triangles in AC. There are k = rank(.ffi(AC)) number of edges unpaired after the 
algorithm, and each of them is necessarily a non-tree edge. Return the set of canonical loops associated 
with them. 



Proposition 2.5 CanonGen (p, JC) outputs the greedy set G p chosen from C p . 

Proof: Let {ei, e 2 • • • , e m } be the non-tree edges in the shortest path tree T listed in the canonical order. 
Let G p = {c p (ej), c p (e2), • • • , c p (e* k )}. It suffices to show that {ej, e\ • • • , e* k } is the set of unpaired edges. 
Observe that for any e*, c p (e*) is independent of any subset of {c p (ej) : ej < e*}. 
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We prove the proposition by contradiction. Assume some e* gets paired by a triangle t in the persistence 
algorithm. Let KL t denote the complex in the filtration right before t is added. Let / : KL t <— > AC be the 
inclusion map; it induces a homomorphism /* = Hi(ACt) — > Hi(/C). Let [L] t denote the homology class 
in /Ct carried by the loop L. The boundary dt uniquely determines a subset of unpaired positive edges 
e i < • • • < e' n in JC t such that [dt] t = [cp(e?i)]t + • • • + [cp(e„)]t- The persistence algorithm IfTSl picks the 
youngest one from this subset to pair with t, i.e., e* = e' n . On the other hand, we have 

[cp(e'i)] + • • • + Md)] + [cp(et)] = M[c p (e[)] t + ■■■ + [c p {e' n _ x )] t + [c p (e*)] t ) = f,([dt] t ) = 

which means that c p (e*) is dependent on a subset of {c p (ej) : < e*}. We reach a contradiction. ■ 
All previous results put together provide a greedy algorithm for computing a shortest basis of Hi (AC). 

Algorithm 3 SPGen (AC) 
1: For each p £ P = Vert(AC) compute G p :=CanonGen (p,AC). Let 
2: Sort all loops in U P G P by their lengths in the increasing order. Let gi, 
3: Initialize G := {<?i}. 
4: fori := 2 to fc|P|,do 
5: if |G| = fc,then 
6: Exit the for loop. 

7: else if g-i is independent of all loops in G, then 

8: Add gi to G. 

9: end if 
10: end for 
11: Return G. 



2.2.2 Checking independence 

In step 7 of SPGen we need to determine if a generator g is independent of all generators g[, . . . , g' s so far 
selected in G. We obtain g from running persistence algorithm on a shortest path tree based filtration for a 
point p in step 3 of CanonGen. At the end of this persistence algorithm we must have gotten an unpaired 
edge, say e, where c p (e) = g. To determine if g is independent of all generators selected so far we adopt a 
sealing technique proposed in O. We fill g[ . . . g' s with triangles. The filling is done only combinatoriaily 
by choosing a dummy vertex, say v, and adding triangles WiVi + \ for each edge V{Vi + i of the loops to be 
filled. Let K,' be the new complex after adding these triangles and their edges to AC. In effect, these triangles 
and edges destroy the generators g[,...,g' s from K. They destroy the generator g as well if and only if g 
is dependent on g[, . . . ,g' s . Since we are sealing according to the greedy order, the proof of Lemma 4.4 
in O applies to establish this fact. Whether g is rendered trivial or not can be determined as follows. We 
continue the persistence algorithm corresponding to the vertex p with the addition of the simplices in AC' \ /C 
and check if e is now paired or not. 

Let n v , n e , and n t denote the number of vertices, edges, and triangles respectively in AC. Notice that we 
add at most n e edges and triangles for sealing since the dummy vertex is added to at most n e edges to create 
new triangles in AC'. 

2.3 Time complexity 

First, we analyze the time complexity of CanonGen. Shortest path tree computation in step 1 of Canon- 
Gen takes 0(n v logn„ + n e ) time. The persistence algorithm for CanonGen can be implemented using 
matrix reductions j 14'] in time 0((n v + n e ) 2 (n e + nt)). This is because there are n v + n e rows in this matrix 



\Gp\. 



, gupi be this sorted list. 
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and each insertion of n e + n t simplices can be implemented in 0(n v + n e ) column operations each taking 
0(n v + n e ) time. Therefore, CanonGen takes 0{n v \ogn v + (n v + n e ) 2 (n e + nt)) time. 

Step 1 of SPGen calls CanonGen n v times. Therefore, step 1 of SPGen takes 0{n 2 log n v + n v (n v + 
n e ) 2 (n e + nt)) time. Step 2 of SPGen can be performed in 0(n v klogn v k) time where k = 0(n e ) is the 
rank of H^/C). The time complexity for independence check in step 7 is dominated by the persistence 
algorithm which is continued on K, to accommodate simplices in Kf . Since we add 0(n e ) new simplices 
in K,', it has the same asymptotic complexity as for running the persistence algorithm on /C. We conclude 
that SPGen spends 0(n v (n v + n e ) 2 (n e + m)) time in total. If we take n = \K\, this gives an 0(n 4 ) time 
complexity. 

Now, we analyze the time complexity of ShortLoop which is the main algorithm. Let n e and nt be 
the number of edges and triangles in 1Z 2r (P) created out of n points. Step 1 takes at most 0(n + n e + n t ) 
time since we only compute edges and triangles of lZ 2r (P) out of n points. Accounting for the persistence 
algorithm in step 2 and the time complexity of step 3 we get that ShortLoop takes 

0(n(n + n e ) 2 (n e + nt)) time. 

The procedure SPGen(ZC) computes canonical sets G p which is ensured by Proposition 12.51 Then, it 
forms a greedy set from these canonical sets which is a shortest basis for Hi(/C) by Proposition 12.41 This 
and the time analysis for SPGen establish Theorem ll.4l 

3 Approximation for M 

The algorithm SpGen is used in ShortLoop to produce a shortest basis for the persistent homology group 
H^' 2r (TZ(P)). Proposition 13.51 in this section shows that a shortest basis of H^' 2r \1Z{P)) coincides with 
a shortest basis in H\{C r {P)). Therefore, if we show that a shortest basis in H\{C r {P)) approximates a 
shortest basis in Hi (M), we have the approximation result of Theoerm ll.3l 

3.1 Connecting M, Cech complex, and Rips complex 

First, we note the following result established in [231 which connects M with the union of the balls P r = 

U peP B(p,r). 

Proposition 3.1 Let P C M be an e-sample. If2e < r < ^J^p(M), there is a deformation retraction from 
P r to M so that the corresponding retraction t : P r — > M has t(B) C B for any ball B € {B(p, r)} pG p. 

Recall that C 2r (P) is the nerve of the cover {B(p, r)} pe p of the space P r . By a result of Leray 11221 . it 
is known that P r and C 2r (P) are homotopy equivalent. The next proposition follows from examining the 
specific equivalence maps used to prove the Nerve Lemma in Hatcher j20 1. In particular, the simplices of 
the Cech complex are mapped to a subset of the union of the balls centered at their vertices, see Appendix 
for its proof. 

Proposition 3.2 There exists a homotopy equivalence f: C 2r (P) — > P r such that for each simplex a G 
C 2r (P), one has f(a) C U peVcrt{(7) B(p, r) and f{p) = pfor any p G P. 

The two propositions above together provide the connection between M and the Cech complex: 

Proposition 3.3 Let P C M be an e-sample. If2e<r< ^|p(M), there is a homotopy equivalence map 
h = t o / : C 2r (P) -> M such that h(a) C M n { l Jp ( zyeit(a)B(p, r)) and h{p) = pfor any p £ P. 
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Now we establish a connection between Cech complex and Rips complexes which helps proving Propo- 
sition [33] 

Proposition 3.4 Let P C M be an e-sample. Then, for Ae < r < ~.i/|p(M), 

Hf r (K(P)) « Hi(C r (P)) « Hi(C 2r (P)) » Hi(C 4r (P)), 
where ju and j'2* «^ induced by the inclusion maps j\ and 22 respectively. Moreover, if 

C\P) ^ K r {P)) 3. C 2r {P)) 2* K 2r (P)) ^ C 4r (P), 

then ji = 12 o ii, and j 2 = «4 o i 3 arccf H^' 2r (7£.(P)) = image (i*) where «.* : Hi(7£. r (P)) — ► Hi(7?. 2r (P)) 
is induced by the inclusion l = i 3 o i 2 . 

Proof: Based on Proposition [33] it can be proved by following the idea in (H of intertwined Cech and Rips 
complexes. ■ 
By definition the set of edges in C r (P) is same as the set of edges in 1Z T {P). This means a set of loops 
in 1Z r (P) also forms a set of loops in C r (P). In light of Proposition [33] this implies: 

Proposition 3.5 Let P C M be an e-sample and 4e < r < ^^J^p(M). Then H T ^ 2r (lZ(P)) « Hi(M) and 
a basis for H r ^ 2r {lZ{P)) is shortest if and only if it is shortest for Hi(C r (P)). 

Proof: From Proposition 13.31 and Proposition 13.41 we have 

H? r {K{P)) » H!(r(P)) » Hi(M). 

Let ^4 = {ai, • • • ,afc} be a shortest basis for H^' 2r (7£(P)). Each a{ is a loop in lZ r (P) and hence in 
C r (P). Obviously A is a basis of Hi(C r (P)) as the inclusion map from C r (P) to TZ r (P) induces a homo- 
morphism. Thus, a shortest basis for Hi(C r (P)) must be no longer than that of H^' 2r (TZ(P)). Similarly if 
A = {01, • • • , afc} is a shortest basis of Hi(C r (P)), then each cij must be in W(P) and survive in 1Z 2r (P) 
as it must survive in C 4r (P). Thus ^4 is a basis for H^' 2r (1Z(P)) and hence a shortest basis of H^ 2r (7£(P)) 
is no longer than that of Hi(C r (P)). This proves the proposition. ■ 

3.2 Bounding the lengths 

Our idea is to argue that a shortest basis of Y\\(C r (P)) can be pulled back to a basis of Hi(M) by the map 
h of Proposition I3.3I We argue that the lengths of the generators cannot change too much in the process. 

Let g be any closed curve in M. Following 0, we define a procedure to approximate g by a loop g in 
the 1-skeleton of C r (P). This procedure called Decomposition method is not part of our algorithm, but is 
used in our argument about length approximations of loops in M. 

Decomposition method: If t = Len(g) > r — 2e > 0, we can write I = £o + (^i+^i + - • -+^i)+^o where 
i\ = r — 2e and r — 2e > £q > (r — 2e)/2. Starting from an arbitrary point, say x, split g into pieces whose 
lengths coincide with the decomposition of t. This produces a sequence of points 

along g which divide it according to the lengths constraints. Because of our sampling condition, each point 
Xi has a point pi € P within e distance. We define a loop g = {popi . . . p m } with consecutive points joined 
by line segments. Proposition 13.61 shows that g resides in the 1-skeleton of C r (P) (proof in the Appendix). 

Proposition 3.6 Given a closed curve g on M with Len(g) > r — 2e > 0, Decomposition method finds a 
loop gfrom the 1-skeleton ofC r {P) such that: Len(^) < r J" 2£ Len(g). 
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Consider a basis of Hi(M) where each generator is a closed geodesic on M. For a smooth, com- 
pact manifold such a basis always exists by a well known result in differential geometry ifTTll . Let G = 
{gi, ... ,gk} be this set of geodesic loops. By Proposition 13.61 we claim that there is a set of loops 
G = {51, • • • , <jk} in C r (P) whose length is within a small factor of the length of G. However, we need to 
show that G indeed generates Hi(C r (P)). We show this by mapping each §j G G to M by the homotopy 
equivalence h and arguing that [h(g~j)] = [gj] in Hi(M). Since h is a homotopy equivalence map, it follows 
that the isomorphism h* : H 1 (C r (P)) — > Hi(M) maps the class [g^-] to [gA. This implies that G generates 
Hi(C(P)). 

To prove that h(gj) is a representative of the class [gA, we consider a tubular neighborhood of gj of 
radius r which is smaller than the convexity radius p c (M). Then, we show that each segment p-iPi+i of gj 
is mapped to a curve h(piPi+i) which lies within this tubular neighborhood. Because of this containment, 
h(piPi + i) must be homotopic to the geodesic segment j(xi,Xi+i) of gj. All these homotopies together 
provide a homotopy between h(gj) and gj. First we show that the tubular neighborhood of a segment of gj 
that we consider is indeed simply connected (see the Appendix for proof). 

Proposition 3.7 Let 7 = j(p, q) be a minimizing geodesic between two points p,q G M. Consider its 
tubular neighborhood Tub s (7) on M that consists of the points on M within a geodesic distance s from 
7, i.e., Tub s (7) = {x G M : min ye7 cIm(x, y) < s}. Then if s < p c (M), Tub s (7) is contractible, in 
particular, Tub s (7) is simply connected. 

Proposition 3.8 Let P C M be an e-sample and 4e < r < mm{^ p(M), p c (M)}. If g is the loop on 
C r (P) constructed from a geodesic loop g in M by Decomposition method, then [h(g)\ = [g] where h is the 
homotopy equivalence defined in Proposition \3.3\ 

Proof: Since g is a geodesic loop, it follows from standard results in differential geometry ifTTl that Len(g) > 
2p c (M). Thus g can be constructed from a geodesic loop g using Decomposition method. Each vertex pi 
of g is within an e Euclidean distance from the point X{ in g. Next, notice that, since C r (P) uses balls of 
radius r/2, the stated range of r satisfies the condition of Proposition 13.31 By Proposition 13.31 for any point 
y on the segment pip,i + i, h{y) is within r/2 Euclidean distance to either pi or Pi+\. This implies that h(y) 
is within r/2 + e Euclidean distance, and hence, by Proposition 1 1.21 within r geodesic distance to either xi 
or scj+i. In addition, since the sub-curve of the geodesic loop g between X{ and denoted g(xi, Xi+i), is 
of length i\ = r — 2e < p c (M), g(xi,Xi + i) is the a minimizing geodesic between xi and SGj+i. Therefore 
h(piPi + \) G Tub r (7(xj, Xi+i)) In particular, the geodesies j(xi, h(j?i)) and jfe+i, h(pi+i)) reside in 
Tub r (7(xi,a; i+ i)). 

Consider the loop formed by the three geodesic segments j(xi, Xi+i), 7(^1, h(pi)), j(xi + i, h(pi + i)), 
and the curve h{piPi + \). From Proposition [3/71 this cycle is contractible in M as it resides in Tub r (7(2:4, Xj+i)). 
In fact, there is a homotopy Hj that takes h{piPi + \) to 7(27, Xi+i) while //j keeps h(pi) and h(pi + i) on the 
geodesies 7(xj,pj) and 7(xj+i,pj+i) respectively. We can combine all homotopies Hi for < i < m to 
define a homotopy between /i(<?) and (7. It follows that [h{g)] = [g]. ■ 

Proposition 3.9 Le? P C M be an e-sample and As < r < min{|p(M), p c (M )}. T/'G = {51, . . . 
a«<i G' = {g[, . . . ,g' k } are the generators of a shortest basis of\-\\{M) and H\{C r {P)) respectively, then 
we have Len(G') < (1 + 75)Len(G). 

Proof: It is obvious that any must be a geodesic loop. Let gi be the loop constructed by Decomposition 
method in the 1-skeleton of C r (P). Thus, we have a set G = {gi, • • • ,<?&}. By Proposition 13.81 there is 
a homotopy equivalence h : C r (P) — > M so that [h(gj)] = [gi], which means that G is also a basis of 
Hi(C r (P)). By Proposition [3U 

Len(G') < Len(G) < — ^— Len(G) < (1 + — )Len(G). 
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We now consider the opposite direction, and provide a lower bound for the total length of a shortest 
basis of Hi(C r (P)) in terms of the length of a shortest basis of Hi(M). 

Proposition 3.10 Let P C M be an e-sample and 4e < r < min{^p(M), p c (M)}. Let G and G' be 

defined as in 

Proposition^ We have LenG < (1 + 3^gy)Len(G'). 

Proof: We construct a set of loops in M from G'. First, we show that the length of these loops is at most 
(1 + 3p f^ M ) ) times the length of G'. Next, we show that the constructed loops generate Hi(M). 

For each loop g' G G', we construct g as follows. The vertices and edges of g' are the vertices and 
edges of C r (P). For an edge e = pq G g', p,q G P thus p, g G M. We connect p and g by the geodesic 
7(p, on M, and map e to this geodesic. Mapping each edge in g' on M, we obtain g. Thus we obtain a 

set<5 = {#i,-- - ,<?*>}. By Proposition O d M {p,q) < (l + ^^)d(p,q) < (1 + 3p f^ )d(p, q). Hence 
the length bound follows. 

We now show that the set G is a basis for M. Consider mapping g'- G G' to M by the equivalence map 
/i. Each edge e = pq £ g[- is mapped to a curve h(pq). From Proposition 13.31 we have that ft(p) = p and 
= q and each point of h{pq) is within r/2 Euclidean distance and hence r geodesic distance to either 
p or q. This implies that h(pq) C Tub r (7(p, q)). Then, by using similar argument as in Proposition 13.81 
we claim that ^(p, q) and h(pq) are homotopic. Combining all homotopies for each edge of g'-, we get that 
h(g'j ) is homotopic to gj. Since h is a homotopy equivalence, h(G') and hence G = {gi, . . . , g~k} are a basis 
of Hi(M). Therefore, 

4r 2 

Len(G) < Len(G) < (1 + ^- — )Len(G'). 

■ 

For an appropriate range of r, shortest bases in C r (P) and H\' 2r (TZ(P)) are same by Proposition 



Theorem 3.11 Let P C M be an e-sample and 4e < r < min{iy|p(M), p c (M)}. Let G and G' 
be a shortest basis of 'Hi(M) and H^' 2r \1Z(P)) respectively. We have ^5 — Len(G) < Len(G') < 

1+ 3p^(M) 

(1 + ^)Len(G). 

Theorem 1 1 . 3 I follows from Theorem l3.11[ Theorem 12. 1[ and the time complexity analysis in section [231 



4 Conclusions 

We have given a polynomial time algorithm for approximating a shortest basis of the first homology group 
of a smooth manifold from a point data. We have also presented an algorithm to compute a shortest basis 
for the first homology of any finite simplicial complex. The question of computing a shortest basis for other 
homology groups under Z2 has been recently settled by Chen and Freedman [ 1 1 ] who show it to be NP-hard. 

We use Rips complexes for computations and use Cech complexes for analysis. One may observe that 
Cech complexes can be used directly in the algorithm. Since we know that C{P) is homotopy equivalent 
to M for an appropriate range of r, we can compute a shortest basis for Hi(C r (P)) which can be shown 
to approximate a shortest basis for Hi(M) using our analysis. In technical terms, this will get rid of the 
weighting in step 1 and also step 4 of ShortLoop algorithm, and make Theorem 12. II and Proposition 13.51 
redundant. Although the time complexity does not get affected in the worst-case sense, computing the trian- 
gles for Cech complexes becomes harder numerically in high dimensions than those for the Rips complexes. 
This is why we chose to describe an algorithm using the Rips complexes. 
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Appendix 

Proof of Proposition 11.21 Proof: Let j(t) be the minimizing geodesic between p and q parameterized by 
arclength and set I = cIm{p, q)- By Proposition 6.3 in ll23l we have that I < 2d(p, q). Let ut = j(t) be the 
unit tangent vector of 7 at t. We have t = cIm{p, 7(£))- 

Let B : x T 7 (^ — > be the second fundamental form associated with the manifold M. Since 
7 is a geodesic, dut/dt = B(ut,u t ) = 7(i). Write p = p(M) and d = d(p,q) for convenience. From 
Proposition 6.1 in 11231 . we have the norm ||7(t)|| < 1/p as the norm of the second fundamental form is 
bounded by 1/p in all directions, and thus ||(int/dt|| < 1/p. Hence we have that 




sm 



2 




Furthermore, let u ■ v denote the dot-product between vectors u and v. Then we have that 



/ 

-Ik 



[0,1] 



ut ■ Up dt 



•/c 



[0,1] 



cos Z 



(ut,u p ) dt = / (1 — 2 sin' 
J[o,l] 



2 Z(ut,u p ) 
2 



)dt 





p-q\\>{q-p)-u p >l--^^l < d+-^ < d + ^. 
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The last inequality follows from the fact that I < 2d. This proves the lemma. 



Proof of Proposition 1X61 Proof: From the construction and sampling condition, it follows that, for 1 < i < 

m — 2, 

d(pi,p i+1 ) < d(xi,pi) + d(xi,x i+ i) + d(x i+ i,p i+ i) 
< 2e + h = r= - r - - i x 



Similarly, 



(r - 2e) 



d(po,Pi) < — -rr^o andd(p m -i,po) < T n J o- 



r — 2e r — 2e 

Since ^r^^o < r > each edge PiPi+i belongs to C r {P). Therefore, we obtain a loop g = popi . . .p m in the 
1-skeleton of C r (P) whose length satisfies: 



Len(g) = £™q dfePi+l) < 



r-2e 



Len(». 



Proof of Proposition 13.71 Proof: We show that Tub s (7) deformation retracts to 7. For any point x E 
Tub s (7), consider a geodesic ball B of radius s. Since s is less than the convexity radius, ^HB has a unique 
point x m which is at a minimum geodesic distance from x. Consider the retraction map t : Tub s (7) — ► 7 
where t(x) = x m . One can construct a deformation retraction that deforms the identity on Tub s (7) to t by 
moving each point x along the minimizing geodesic path that connect x to x m in 7. ■ 

Proof of Proposition 13.21 Proof: The proof is based on that of Nerve Lemma in ll20l (Chapter 4.G). Let Y 
be the barycentric subdivision of C 2r (P). Taking the definitions of the maps Ap, Aq, and the space AP r 
from Hatcher [20], we consider the following sequence 



, Aq 

C 2r (P) Ar^ AP r A P r . 
Ap 



(1) 



We prove the proposition by showing / = it o Aq o h which is a homotopy equivalence. We first introduce 
the concept of mapping cylinder. For a map f : X —> Y, the mapping cylinder Mf is the quotient space 
of the disjoint union (X x I)\_\Y with (x, 1) identified with f(x) E Y, denoted Mf = X\_\rY, see 
Figure (Ha). It is obvious that Mf deformation retracts to Y. It is also well-known that / is a homotopy 



Xxl 
Y 



fix) 




}M f X 




(a) 



M f = XU f Y 
(b) 



Figure 1: (a) the mapping cylinder Mf = X \_\f Y (courtesy of Hatcher ll20l ): (b) the maps among X, Y 
and Mf 

equivalence map if and only if Mf deformation retracts to X, see Figure (Hb), where the map g = ex iy 
is a homotopy equivalence map from Y to X. 
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Figure 2: Illustration of the maps and the spaces involved in Eq.[TJ 



We are now ready to explain each map in the composition of the map /. F is the barycentric subdivision 
of C 2r (P). Thus h is an identity map between the underlying spaces of C 2r (P) and F. Index the points in 
P = {pi}™ =1 arbitrarily Let Bi = B(pi,r). To facilitate the argument, label the vertices in F using BiS 
and their finite intersections, see Figure [2] Each edge (one simplex) in F is associated with an inclusion 
map, which induces a sequence of inclusion maps over a simplex of any dimension in F. 

AP r can be realized using the concept of mapping cylinder, see the top right most picture in Figure [2] 
The sequence of inclusion maps associated with each simplex in F 

(B io n • • • n B in ) (B io n • • • n B in _ x ) (B io n • • • n B in _ k ), 

induces an iterated mapping cylinder. AP r is obtained by gluing these iterated mapping cylinders over 
all simplices in F each simplex A k having pj as a vertex in the barycentric subdivision of a is mapped into 
B(pi, r) under / along their boundaries, see ||20l for details. There is a canonical projection Ap : AP r — > F 
induced by projecting each finite intersection to its corresponding vertex in F. Consider the mapping cylinder 
Ma p . The Nerve Lemma is proved in |[20l by showing M& p deformation retracts to AP r . In fact, the 
deformation retraction described there maps a simplex A k G F to the part of AP r defined over the same 
A fc , namely Aq = eAP r o i r is a homotopy equivalence and maps a simplex A k G F into the iterated 
mapping cylinder defined by the sequence of inclusion map associated with A fc . 

On the other hand, AP r can also be considered as the quotient space of the disjoint union of all the 
products Bi n • • • n Bi n x A", as the subscripts range over set of n + 1 distinct indices and any n > 0, 
with the identifications over the faces of A n using inclusions B- H) n • • • D Bi n Bi n ■ ■ ■ fl Bi. Pi • • • Pi Bi n 
where * means the corresponding term is missing. From this viewpoint, any point i£P r has a fiber 7r~ 1 (x) 
in AP r defined as follows. 7r~ 1 (a;) = tiXi] where Yli U = 1 an d t* > 0, and X{ is a copy of x in 
Bi for those Bi containing x. see the bottom left most picture in Figure 12 It is easy to see that P r can be 
embedded into AP r as a section of AP r , in particular it is a homotopy equivalence. Thus / is a homotopy 
equivalence. 
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Observe that each point y in an iterated mapping cylinder over some simplex A k = (B{ n • • • n 
Bi n , • • • , Bi fl ■ ■ ■ Pi Bi n _ k ) in T is in the fiber 7r _1 (x) for some x in B: lQ . In other words, if A k is in 
the closure of the star of a point p G P in T, then any point y in the iterated mapping cylinder over A k is 
in the fiber of a point x G B(p,r). Now consider a simplex u G C 2r (P). Any simplex in its barycentric 
subdivision much be in the closure of the star of some vertex of a. Thus a, under the map A q o h, is mapped 
into the union of the iterated mapping cylinders defined over the simplices in the barycentric subdivision of 
a, and its image, under the map it, is further mapped into U pe y cl . t ( a -)B(p, r). 

In addition, it is clear that the map / can fix each vertex in C 2r (P). This proves the proposition. ■ 
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